API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
https://arxiv.org/abs/2304.08244
we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs (Abstract)
3つのopen question
(1) How effective are current LLMs in utilizing tools?
we develop a runnable evaluation system consisting of 73 API tools.
We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs.
(2) How can we enhance LLMs' ability to utilize tools?
we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
このデータセットでAlpacaからLynxを訓練した
(3) What obstacles need to be overcome to leverage tools?
future research (Lynxのエラー分析から)
https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank